Skip to content

Conversation

@vstinner
Copy link
Member

@vstinner vstinner commented Sep 22, 2025

Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the PyBytesWriter API.

Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the
PyBytesWriter API.
@vstinner
Copy link
Member Author

vstinner commented Sep 22, 2025

Benchmark:

import pyperf
runner = pyperf.Runner()
sizes = (3, 100, 1000)
for size in sizes:
    runner.timeit(f'{size:,} ASCII chars',
        setup=f's="x"*{size}',
        stmt='s.encode("utf7")')
for size in sizes:
    runner.timeit(f'{size:,} UCS-1 chars',
        setup=f's=chr(0xe9) * {size}',
        stmt='s.encode("utf7")')
for size in sizes:
    runner.timeit(f'{size:,} UCS-2 chars',
        setup=f's=chr(0x20ac) * {size}',
        stmt='s.encode("utf7")')
for size in sizes:
    runner.timeit(f'{size:,} UCS-4 chars',
        setup=f's=chr(0x10ffff) * {size}',
        stmt='s.encode("utf7")')

Results with CPU isolation and python -m pyperf system tune:

Benchmark ref pep782
3 ASCII chars 487 ns 480 ns: 1.01x faster
100 ASCII chars 785 ns 831 ns: 1.06x slower
1,000 ASCII chars 2.46 us 2.93 us: 1.19x slower
3 UCS-1 chars 499 ns 487 ns: 1.02x faster
1,000 UCS-1 chars 5.58 us 5.72 us: 1.03x slower
3 UCS-2 chars 498 ns 487 ns: 1.02x faster
100 UCS-2 chars 1.14 us 1.18 us: 1.04x slower
1,000 UCS-2 chars 6.01 us 6.04 us: 1.01x slower
100 UCS-4 chars 1.63 us 1.65 us: 1.01x slower
1,000 UCS-4 chars 10.4 us 10.4 us: 1.00x slower
Geometric mean (ref) 1.02x slower

Benchmark hidden because not significant (2): 100 UCS-1 chars, 3 UCS-4 chars

I'm not sure what's going with 1,000 UCS-1 chars. It looks like a hiccup in the benchmark, not a real regression.

The change uses the same memory allocation strategy, so it should have basically no impact on performance.

@vstinner
Copy link
Member Author

1,000 UCS-4 chars: 6.96 us => 6.48 us: 1.07x faster

If I re-run the benchmark with this change, I get: 5.84 us: 1.19x faster. Hum, it seems like the benchmark is not reliable :-(

@vstinner
Copy link
Member Author

Benchmark on this change: python -m pyperf timeit -s 's=chr(0x10ffff) * 1000' 's.encode("utf7")'

  • Build 1: Mean +- std dev: 5.95 us +- 0.06 us
  • Build 2: Mean +- std dev: 5.72 us +- 0.08 us
  • Build 3: Mean +- std dev: 6.34 us +- 0.51 us

New try with CPU isolation and python -m pyperf system tune:

  • Build 1: Mean +- std dev: 10.4 us +- 0.0 us
  • Build 2: Mean +- std dev: 10.4 us +- 0.0 us
  • Build 3: Mean +- std dev: 10.4 us +- 0.0 us

@vstinner
Copy link
Member Author

I'm not sure what's going with 1,000 UCS-1 chars. It looks like a hiccup in the benchmark, not a real regression.

If I re-run the benchmark, it becomes faster: Mean +- std dev: [ref] 486 ns +- 3 ns -> [pep782] 484 ns +- 4 ns: 1.01x faster

@vstinner vstinner merged commit c863349 into python:main Sep 22, 2025
47 checks passed
@vstinner vstinner deleted the utf7 branch September 22, 2025 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant